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1 Code o ptimization - 1: Com piler o ptimization-space exploration 

Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David I. August 

March 2003 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization CGO '03 
Publisher: IEEE Computer Society 

Additional Information: full citation , abstract , reference s, citings, in dex 
terms 



Full text available: *g| pdff1.19 MB) 



To meet the demands of modern architectures, optimizing compilers must incorporate an 
ever larger number of increasingly complex transformation algorithms. Since code 
transformations may often degrade performance or interfere with subsequent 
transformations, compilers employ predictive heuristics to guide optimizations by 
predicting their effects a priori. Unfortunately, the unpredictability of optimization 
interaction and the irregularity of today's wide-issue machines severely limit the accura .. 
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Low Pow er E mbedded Software Optimization Using. Symbolic Algebra 
A. Peymandoust, T. Simunic, G. de Micheli 

March 2002 Proceedings of the conference on Design, automation and test in Europe 
DATE 02 

Publisher: IEEE Computer Society 

Full text available: ^[pdf d 19.80 KB) Additional Information: full citatio n, abstract, citings 

The market demand for portable multimediaapplications has exploded in the recent 
years. Unfortunately, for such applications current compilers andsoftware optimization 
methods often require designers todo part of the optimization manually. Specifically, 
thehigh-level arithmetic optimizations and the use of complextnstructions are left to the 
designers' ingenuity. In thispaper, we present a tool flow, SymSoft, that automates 
theoptimization of power-intensive algorithmic constructsusing symbolic a ... 

High-level power estimation (invited talks): Source code optimization and profiling of 

ener g y consumption in embedded systems 

Tajana Simunic, Luca Benini, Giovanni De Micheli, Mat Hans 

September 2000 Proceedings of the 13th international symposium on System 

synthesis ISSS '00 
Publisher: IEEE Computer Society 

Full text available: ^.pdf(81.88KB) Additional Information: full cit ation, abstract, references, citings 

This paper presents a source code optimization methodology and a profiling tool that have 
been developed to help designers in optimizing software performance and energy in 
embedded systems. Code optimizations are applied at three levels of abstraction: 
algorithmic, data and instruction-level. The profiler exploits a cycle-accurate energy 
consumption simulator [3] to relate the embedded system energy consumption and 
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performance to the source code. Thus, it can be used for analysis (i.e., to find ... 

4 Cod e co m pressi on: Compiler optimization and ordering effects on VLIW code Q 
com pr ession 

Montserrat Ros, Peter Sutton 

October 2003 Proceedings of the 2003 international conference on Compilers, 
architecture and synthesis for embedded systems CASES '03 

Publisher: ACM Press 

Full text available- *Pl Ddft334 18 KB) Add ^ onal Information: full citation , abstract , references , citing s, index 
= 1 terms 

Code size has always been an important issue for all embedded applications as well as 
larger systems. Code compression techniques have been devised as a way of battling 
bloated code; however, the impact of VLIW compiler methods and outputs on these 
compression schemes has not been thoroughly investigated. This paper describes the 
application of single- and multiple-instruction dictionary methods for code compression to 
decrease overall code size for the TI TMS320C6xxx DSP family. The compression ... 

Keywords: VLIW, code compression, compiler optimizations 



Optimizing for red uc ed co de space using genetic algorithms 
Keith D. Cooper, Philip J. Schielke, Devika Subramanian 

May 1999 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1999 workshop 
on Languages, compilers, and tools for embedded systems LCTES '99, 

Volume 34 Issue 7 
Publisher: ACM Press 

Full text available- ff) pdf(977 31 KB) Add ' t ' onat Information: full citation, abstract, references, citings, index 
• |A) 1 terms 

Code space is a critical issue facing designers of software for embedded systems. Many 
traditional compiler optimizations are designed to reduce the execution time of compiled 
code, but not necessarily the size of the compiled code. Further, different results can be 
achieved by running some optimizations more than once and changing the order in which 
optimizations are applied. Register allocation only complicates matters, as the interactions 
between different optimizations can cause more spill c ... 

Compiler analysis and optimization: General loop fusion technique for nested loo ps 

considering timing and code size 

Meilin Liu, Qingfeng Zhuge, Zili Shao, Edwin H.-M. Sha 

September 2004 Proceedings of the 2004 international conference on Compilers, 
architecture, and synthesis for embedded systems CASES *04 

Publisher: ACM Press 

Full text available: *Q pdf( 307.4Q KB) Additional Information: full cita tion, abstra ct, references, i nde x terms 

Loop fusion is commonly used to improve the instruction-level parallelism of loops for 
high-performance embedded computing systems. Loop fusion, however, is not always 
directly applicable because the fusion prevention dependencies may exist among loops. 
Most of the existing techniques still have limitations in fully exploiting the advantages of 
loop fusion. In this paper, we present a general loop fusion technique for loops or nested 
loops based on the loop dependency graph model, retiming, and ... 

Keywords: code size, embedded DSP, loop fusion, retiming, scheduling 



O ptimizing Address Code Generation for Array-Intensive DSP A p plications 
Guilin Chen, Mahmut Kandemir 

March 2005 Proceedings of the international symposium on Code generation and 
optimization CGO '05 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(523.04 KB) Additional Information: full citati on, abstract , index terms 
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The application code size is a critical design factor for many embedded systems. 
Unfortunately, most available compilers optimize primarily for speed of execution rather 
than code density. As a result, the compiler-generated code can be much larger than 
necessary. In particular, in the DSP domain, the past research found that optimizing 
address code generation can be very important since address code can account for over 
50% of all program bits. This paper presents a compiler-directed scheme to ... 

8 Ada software en gineering and optimized code 
/£\ Gary Frankel 

July 1989 Proceedings of the conference on TRI-Ada '88 TRI-Ada '88 
Publisher: ACM Press 

Full text available: ^(518.89 Addj{jona| |nformation: fyH citation, index terms 




9 Pipe lined Execu tion of Critic al Section s Using Software-Controlled Cachin g in j 
Network Processo rs 

Jinquan Dai, Long Li, Bo Huang 

March 2007 Proceedings of the International Symposium on Code Generation and 
Optimization CGO '07 

Publisher: IEEE Computer Society 

Full text available: ^[ pdf(9.78 MB) Additional Information: full citation , abstract , index terms 

To keep up with the explosive internet packet processing demands, modern network 
processors (NPs) employ a highly parallel, multi-threaded and multi-core architecture. In 
such a parallel paradigm, accesses to the shared variables in the external memory (and 
the associated memory latency) are contained in the critical sections, so that they can be 
executed atomically and sequentially by different threads in the network processor. In this 
paper, we present a novel program transformation that is us ... 

10 Critical pa th reduction for scalar programs j 
Michael Schlansker, Vinod Kathail 

December 1995 Proceedings of the 28th annual international symposium on 

Microarchitecture MICRO 28 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf( 1.38 MB ) Additional Information: full citation, references, citings, index terms 



1 1 Shor t presentations with p o sters : SCCP/x: a compilation profile to su pport testing and jg£ 
verification of optimized code 

^ Raimund Kirner 

September 2007 Proceedings of the 2007 international conference on Compilers, 

architecture, and synthesis for embedded systems CASES f 07 
Publisher: ACM Press 

Full text available: ^ pdf(350.81 KB) Additional Information: full citation , abstract , references , index terms 

Embedded systems are often used in safety-critical environments. Thus, thorough testing 
of them is mandatory. A quite active research area is the automatic test-case generation 
for testing embedded systems. To achieve high retargetability of the testing framework, 
thetest-case generation has to be done at source-code level. However, it is challenging to 
guarantee that the test-cases obtained from the source code are also valid at the object- 
code level, since even in safety-critical domains pr ... 

Keywords: code transformation, compiler, decision coverage, optimization, structural 
code-coverage preservation, testing 
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translator 

Chaohao Xu, Jianhui Li, Tao Bao, Yun Wang, Bo Huang 

June 2007 Proceedings of the 3rd international conference on Virtual execution 
environments VEE '07 

Publisher: ACM Press 

Full text available: *g) pdf(3.42 MB ) Additional Information: full citation , abstract , references , index terms 

A dynamic binary translator offers solutions for translating and running source 
architecture binaries on target architecture at runtime. Regardless of its growing 
popularity, practical dynamic binary translators usually suffer from the limited 
optimizations performed when generating the translated code due to the lack of useful 
information available in the executable files and the requirement to conform to the 
binary-level compatibility. Trying to generate more efficient translated code, we p ... 

Keywords: dynamic binary translator, memory optimizations, metadata 



13 A study of source-level compiler algo rithms for automatic constructio n of pre- 
H> execution code 

Dongkeun Kim, Donald Yeung 

August 2004 ACM Transactions on Computer Systems (TOCS), Volume 22 issue 3 
Publisher: ACM Press 

Full text available: fQ pdf(1 .55 MB) Additional Information: fyLcitatjon, abstract, references, citings, index 
■ [A| : terms 

Pre-execution is a promising latency tolerance technique that uses one or more helper 
threads running in spare hardware contexts ahead of the main computation to trigger 
long-latency memory operations early, hence absorbing their latency on behalf of the 
main computation. This article investigates several source-to-source C compilers for 
extracting pre-execution thread code automatically, thus relieving the programmer or 
hardware from this onerous task. We present an aggressive profile-driven co ... 

Keywords: Data prefetching, memory-level parallelism, multithreading, pre-execution, 
prefetch conversion, program slicing, speculative loop parallelization 



14 O ptimized code restructuring of OS/2 executables 

Jyh-Herng Chow, Yong-fong Lee, Kalyan Muthukumar, Vivek Sarkar, Mauricio Serrano, Iris 
Garcia, John Hsu, Shauchi Ong, Honesty Young 

November 1995 Proceedings of the 1995 conference of the Centre for Advanced 

Studies on Collaborative research GASCON '95 
Publisher: IBM Press 

Full text available: 6 ^ pdf(234.83 KB) Additional Information: fulLcitation, abstract, references, Index Jerrns 

This paper describes the design and algorithms of FDPR/2 (Feedback Directed Program 
Restructuring of OS/2 executables), a general-purpose tool that can be used to 
instrument, profile, and restructure/optimize OS/2 executables for the tel x86 
architecture. The optimizations delivered by FDPR/2's restructuring include improved 
utilization of the (instruction) memory hierarchy, improved branch alignment, and dead 
code elimination. These optimizations are known to be critical for object-oriented pro ... 

15 Im provin g WCET b y applying a WC cod e- positionin g o ptimizatio n 
Wankang Zhao, David Whalley, Christopher Healy, Frank Mueller 

December 2005 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 2 Issue 4 
Publisher: ACM Press 

Full text available* t f£l pdf(510.31 KB) Addltional Information: full citatio n, abstract , references , citing s, in dex 

terms 

Applications in embedded systems often need to meet specified timing constraints. It is 
advantageous to not only calculate the worst-case execution time (WCET) of an 
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application, but to also perform transformation, which reduce the WCET, since an 
application with a lower WCET will be less likely to violate its timing constraints. Some 
processors incur a pipeline delay whenever an instruction transfers control to a target that 
is not the next sequential instruction. Code-positioning optimizations ... 

Keywords: WCET, code positioning, embedded systems 



16 Rafiid^veloeffi^ DSP code froma high level, description through 




software estimations 

Alain Pegatoquet, Emmanuel Gresset, Michel Auguin, Luc Bianco 

June 1999 Proceedings of the 36th ACM/IEEE conference on Design automation DAC 
•99 

Publisher: ACM Press 

Full text available: *g| pdf(51 .51 KB ) Additional Information: full citation , references , index terms 



Keywords: DSP, code generation, performance estimation 



17 Architectural desig n for embedded systems: Desi g n s p ace minimization w ith timing j| 
H> and code size optimization for embedded DSP 
Qingfeng Zhuge, Zili Shao, Bin Xiao, Edwin H.-M. Sha 

October 2003 Proceedings of the 1st IEEE/ACM/IFIP international conference on 

Hardware/software codesign and system synthesis CODES+ISSS '03 
Publisher: ACM Press 

Full text available: *g| pdf(131.73 KB) Additional Information: full citatio n, abstract, r eferen ces, indjtxjejins 

One of the most challenging problems in high-level synthesis is how to quickly explore a 
wide range of design options to achieve high-quality designs. This paper presents an 
Integrated Framework for Design Optimization and Space Minimization (IDOM) towards 
finding the minimum configuration satisfying timing and code size constraints. We show 
an effective way to reduce the design space to be explored through the study of the 
fundamental properties and relations among multiple design parameters, s ... 

Keywords: DSP processors, code size reduction, retiming, unfolding 



1 8 Source-lev el gl obal optim iz ation s for fine-grain di stributed sh ar ed m emory systems Q 
R. Veldema, R. F. H. Hofman, R. A. F. Bhoedjang, C. J. H. Jacobs, H. E. Bal 
June 2001 ACM SIGPLAN Notices , Proceedings of the eighth ACM SIGPLAN 

symposium on Principles and practices of parallel programming PPoPP 
'01, Volume 36 Issue 7 
Publisher: ACM Press 

Full text available: « pdfH 12.60 KB ) Additiona! lnformation: MldMoQ. abstract, references, citings, index 

terms 

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine- 
grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, 
source-level compiler rather than the binary rewriting techniques employed by most other 
fine-grain DSM systems. Source-level analysis makes existing access-check optimizations 
(e.g., access-check batching) more effective and enables two novel fine-grain DSM 
optimizations: object-graph aggregatio ... 



19 Inter action cost and shotgun profiling 

/£v Brian A. Fields, Rastislav Bodik, Mark D. Hill, Chris J. Newburn 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 3 
Publisher: ACM Press 
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We observe that the challenges software optimizers and microarchitects face every day 
boil down to a single problem: bottleneck analysis. A bottleneck is any event or resource 
that contributes to execution time, such as a critical cache miss or window stall. Tasks 
such as tuning processors for energy efficiency and finding the right loads to prefetch all 
require measuring the performance costs of bottlenecks. In the past, simple event counts 
were enough to find the important bottlenecks. Today, t ... 

Keywords: Performance analysis, critical path, modeling, profiling 



20 Partial dead code elimination usin g slicin g transformations 



May 1997 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1997 conference 



We present an approach for optimizing programs that uncovers additional opportunities 
for optimization of a statement by predicating the statement. In this paper predication 
algorithms for achieving partial dead code elimination (PDE) are presented. The process of 
predication embeds a statement in a control flow structure such that the statement is 
executed only if the execution follows a path along which the value computed by the 
statement is live. The control flow restructuring performe ... 
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1 A com piler al g orithm for optimizin g locality in loop nests 
M. Kandemir, J. Ramanujam, A. Choudhary 

July 1997 Proceedings of the 11th international conference on Supercomputing ICS 
'97 

Publisher: ACM Press 

Full text available: 1|3 pdf(1.08 MB) Additional Information: full citation , references , citin gs, index terms 



A hvper plane based a p proach for o pti mizin g s patial locality in loo p nests 
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, J. Ramanujam 

July 1998 Proceedings of the 12th international conference on Supercomputing ICS 
•98 

Publisher: ACM Press 

Full text available:^ pdfd .1 3 M B) Additional Information: fuN citation, re fere nces, citings, indexje.rrns 



Opti mized unrollin g of nested loops 
Vivek Sarkar 

May 2000 Proceedings of the 14th international conference on Supercomputing ICS 
f 00 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, in dex 
terms 



Full text available:^ pdfd .10 MB) 



In this paper, we address the problems of automatically selecting unroll factors for 
perfectly nested loops, and generating compact code for the selected unroll factors. 
Compared to past work, the contributions of our work include a) a more detailed cost 
model that includes ILP and 1-cache considerations, b) a new code generation algorithm 
for unrolling nested loops that generates more compact code (with fewer remainder 
loops) than the unroll-and-jam transf ... 

A compiler optimization to reduce execution time of loop nest 
Oh-Young Kwon, Gi-Ho Park, Tack-Don Han 

March 1996 ACM SIGARCH Computer Architecture News, volume 24 issue l 
Publisher: ACM Press 

Full text available: ^J)_pdf(333 s 4l KB) Additional Information: full citation, abstract, index terms 

In this paper, a compiler optimization to reduce the execution time of loop nest is 
proposed. Loop tiling is used to optimize loop nest. Loop tiling is the well-known 
optimization for improving locality. However, it has a count result that increases the 
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number of instructions to loop control. These increased instructions disturb the effect of 
locality optimization. Therefore, the tiling of innermost loop is not perform in order to 
reduce the instructions for loop control in this paper. This opti ... 

5 Automatic memory layou t transformations to optimize spatial locality in parameteriz ed 
^ loop nests 

^ Philippe Clauss, Benoit Meister 

March 2000 ACM SIGARCH Computer Archi tecture News, Volume 28 issue i 
Publisher: ACM Press 

Full text available: *g| pdf(537.80 KB) Additional Information: full citation , abstract , index terms 

One of the most efficient ways to improve program performances onto nowadays 
computers is to optimize the way cache memories are used. In particular, many scientific 
applications contain loop nests that operate on large multi-dimensional arrays whose sizes 
are often parameterized. No special attention is paid to cache memory performance when 
such loops are written. In this work, we focus on spatial locality optimization such that all 
the data that are loaded as a block in the cache will be used ... 

Keywords: Ehrhart polynomials, cache memory, loop nests, optimizing compiler, 
parameterized polyhedron, program performance optimization, spatial locality 



Session S4.2: program transformation: Optimizing inter-nest data Jocaljty 
M. Kandemir, I. Kadayif, A. Choudhary, J. A. Zambreno 

October 2002 Proceedings of the 2002 international conference on Compilers, 

architecture, and synthesis for embedded systems CASES '02 
Publisher: ACM Press 

Full text available: "jg?) pdf(272.47 KB) Additional Information: full citation , abstract , reference s, index terms 

By examining data reuse patterns of four array-intensive embedded applications, we 
found that these codes exhibit a significant amount of inter-nest reuse (i. e., the data 
reuse that occurs between different nests). While traditional compiler techniques that 
target array-intensive applications can exploit intra-nest data reuse, there has not been 
much success in the past in taking advantage of internest data reuse. In this paper, we 
present a compiler strategy that optimizes inter-nest reuse usi ... 

Keywords: array-intensive codes, cache locality, data reuse, embedded applications, 
inter-nest optimization 



Quantifying loop nest locality using SPEC'95 and the perfect benchmarks 
Kathryn S. McKinley, Olivier Temam 

November 1999 ACM Transactions on Computer Systems (TOCS), volume 17 issue a 
Publisher: ACM Press 

Full text available- « pdf(635 63 KB) Additiona! Information: ^citation, abstract, references, citings, index 
* terms 

This article analyzes and quantifies the locality characteristics of numerical loop nests in 
order to suggest future directions for architecture and software cache optimizations. Since 
most programs spend the majority of their time in nests, the vast majority of cache 
optimization techniques target loop nests. In contrast, the locality characteristics that 
drive these optimizations are usually collected across the entire application rather than at 
the nest level. Researchers have studied nu ... 

A quantitative analysis of loop nest locality 
Kathryn S, McKinley, Olivier Temam 

September 1996 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , 

Proceedings of the seventh international conference on Architectural 
support for programming languages and operating systems ASPLOS- 
VII, Volume 31 , 30 Issue 9 , 5 
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This paper analyzes and quantifies the locality characteristics of numerical loop nests in 
order to suggest future directions for architecture and software cache optimizations. Since 
most programs spend the majority of their time in nests, the vast majority of cache 
optimization techniques target loop nests. In contrast, the locality characteristics that 
drive these optimizations are usually collected across the entire application rather than 
the nest level. Indeed, researchers have studied nume ... 

9 Control Flow Driven Splittin g of Loop Nests at the Source Code Level |jj 
Heiko Falk, Peter Marwedel 

March 2003 Proceedings of the conference on Design, Automation and Test in Europe 
- Volume 1 DATE '03 

Publisher: IEEE Computer Society 

Full text available: f3 pdf( 215.57 KB) 

~=jf Additional Information: full citation , abstra ct, index terms 

^ Publisher Site 

This paper presents a novel source code transformation for control flow optimization 
called loop nest splitting which minimizes the number of executed if-statements in loop 
nests of embedded multimedia applications. The goal of the optimization is to reduce 
runtimes and energy consumption. The analysis techniques are based on precise 
mathematical models combined with genetic algorithms. Due to the inherent portability of 
source code transformations, a very detailed benchmarking using 10 differen ... 

1 0 J.niegL^ng_loor3_a_nd .data optimjzations fpMocal]ty_ within. a^cgnstraiQt .network base_d 
framework 

Guilin Chen, O. Ozturk, M. Kandemir, I. Kolcu 

May 2005 Proceedings of the 2005 IEEE/ACM International conference on 
Computer-aided design ICCAD '05 

Publisher: IEEE Computer Society 

Full text available: *Q pdf(234.62 KB) Additional Information: full citati on, abstrac t, citings 

In the context of data-intensive embedded applications, there have been two 
complementary approaches to data locality problem: restructuring code and restructuring 
data layout. Conceivably, an integrated approach that combines these two can generate 
much better results than each individual approach. However, there is an inherent difficulty 
in optimizing both data layout and loop access pattern simultaneously under a unified 
setting. This difficulty occurs due to the fact that a given data struct ... 

11 O ptimization of array accesses by collective loop transformations 
✓5*\ Vivek Sarkar, Guang R. Gao 

>^ June 1991 Proceedings of the 5th international conference on Supercom puting ICS 
Publisher: ACM Press 

Full text available:^ pdfd .1 0 MB ) Additional Information: full citatio n, references, citings, index terms 



1 2 Blockin g and array contraction across aibitj_axiiy_nes_ted toop_s jjsing _affi ne ..pa rtition ing 
Amy W. Lim, Shih-Wei Liao, Monica S. Lam 

June 2001 ACM SIGPLAN Notices , Proceedings of the eighth ACM SIGPLAN 

symposium on Principles and practices of parallel programming PPoPP 
'01, Volume 36 Issue 7 
Publisher: ACM Press 

Full text available: ® pdf(29O60KB) Additional lnformation: MfiitefiOQ, abstract, reference_s, citings, index 
^ terms 

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program 
transformation framework that unifies many previously proposed loop transformations, 
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including unimodular transforms, fusion, fission, reindexing, scaling and statement 
reordering. Algorithms based on affine partitioning have been shown to be effective for . 
parallelization and communication minimization. This paper presents algorithms that 
improve data locality using affine partitioning. Blockin ... 

13 Compiler analysis and optimization: General loop fusion technique for nested loops 

a> considering timing and code size 

^ Meilin Liu, Qingfeng Zhuge, Zili Shao, Edwin H.-M. Sha 

September 2004 Proceedings of the 2004 international conference on Compilers, 
architecture, and synthesis for embedded systems CASES '04 

Publisher: ACM Press 

Full text available: pdf(307.4Q KB) Additional Information: full citati on, abstract, r eferences . indejcJeriTis 

Loop fusion is commonly used to improve the instruction-level parallelism of loops for 
high-performance embedded computing systems. Loop fusion, however, is not always 
directly applicable because the fusion prevention dependencies may exist among loops. 
Most of the existing techniques still have limitations in fully exploiting the advantages of 
loop fusion. In this paper, we present a general loop fusion technique for loops or nested 
loops based on the loop dependency graph model, retiming, and ... 

Keywords: code size, embedded DSP, loop fusion, retiming, scheduling 



14 Loop o ptimization for a class of memory -c onstrained computations 

D. Cociorva, J. W. Wilkins, C. Lam, G. Baumgartner, j. Ramanujam, P. Sadayappan 
June 2001 Proceedings of the 15th international conference on Supercomputing ICS 
'01 

Publisher: ACM Press 



Full text available- ffl pdf (1 60.59 KB) Additional Information: full citation, abstract, references, citings, index 
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Compute-intensive multi-dimensional summations that involve products of several arrays 
arise in the modeling of electronic structure of materials. Sometimes several alternative 
formulations of a computation, representing different space-time trade-offs, are possible. 
By computing and storing some intermediate arrays, reduction of the number of 
arithmetic operations is possible, but the size of intermediate temporary arrays may be 
prohibitively large. Loop fusion can be applied to reduce memor ... 

1 5 A preprocessing step for global loop, tra n^format[onsJor data tra_n^ejjo|^mizMQn 
a Koen Danckaert, Francky Catthoor, Hugo De Man 

>f November 2000 Proceedings of the 2000 international conference on Compilers, 
architecture, and synthesis for embedded systems CASES '00 
Publisher: ACM Press 

Full text available: fg?| pdf(202.94 KB) Additional Information: full citation, citings 



16 Integ rated Loo p O ptimizations f or Data Locality Enha ncement of Ten sor Contraction 
Ex p ressions 

Swamp Kumar Sahoo, Sriram Krishnamoorthy, Rajkiran Panuganti, P. Sadayappan 
November 2005 Proceedings of the 2005 ACM/IEEE conference on Supercomputing SC 
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A very challenging issue for optimizing compilers is the phase ordering problem: In what 
order should a collection of compiler optimizations be performed? We address this 
problem in the context of optimizing a sequence of tensor contractions. The pertinent loop 
transformations are loop permutation, tiling, and fusion; in addition, the placement of disk 
I/O statements crucially affects performance. The space of possible combinations is 
exponentially large. We develop novel pruning strategies wher ... 
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Portable or embedded systems allow more and more complex applications like multimedia 
today. These applications and submicronic technologies have made the power 
consumption criterium crucial. We propose new techniques thanks to which we can 
optimize the behavioral description of an integrated system before the hardware/software 
partitioning (Codesign). These transformations are performed on "for" loops that 
constitute the main parts of the multimedia code which handle the arrays. We present in 
t ... 

18 A transform ation-based ap proach to optimizin g loops in database pro grammin g 
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^ Daniel F. Lieuwen, David J. DeWitt 

June 1992 ACM SIGMOD Record , Proceedings of the 1992 ACM SIGMOD international 
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Database programming languages like 02, E, and 0++ include the ability to iterate 
through a set. Nested iterators can be used to express joins. This paper describes 
compile-time optimizations similar to relational transformations like join reordering for 
such programming constructs. This paper also shows how to use a standard 
transformation-based optimizer to optimize these joins. An optimizer built using the 
EXODUS Opt ... 

1 9 Handlin g irreducible loo ps: optimized nod es q\ itti n g ve rsu s DJ-graphs 
Sebastian linger, Frank Mueller 
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Volume 24 Issue 4 
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This paper addresses the question of how to handle irreducible regions during 
optimization, which has become even more relevant for contemporary processors since 
recent VLIW-like architectures highly rely on instruction scheduling. The contributions of 
this paper are twofold. First, a method of optimized node splitting to transform irreducible 
regions of control flow into reducible regions is formally defined and its correctness is 
shown. This method is superior to approaches previously publishe ... 

Keywords: Code optimization, compilation, control flow graphs, instruction-level 
parallelism, irreducible flowgraphs, loops, node splitting, reducible flowgraphs 
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Irregular loop nests in which the loop bounds are determined dynamically by indexed 
arrays are difficult to compile into expressive parallel constructs, such as segmented 
scans and reductions. In this paper, we describe a suite of transformations to 
automatically parallelize such irregular loop nests, even in the presence of recurrences. 
We describe a simple, general loop flattening transformation, along with new 
optimizations which make it a viable compiler transformation. A robust recurre ... 
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Compile-time minimisation of load imbalance in loop n ests 



/itv Rizos Sakellariou, John R. Gurd 

>^ July 1997 Proceedings of the 11th international conference on Supercomputing ICS 
'97 

Publisher: ACM Press 

Full text available: "p?! pdf(997.11 KB) Additional Information: full citation, references , index terms 



2 A compiler al g orithm for optimizin g locality in loop nests 
|^ M. Kandemir, J. Ramanujam, A. Choudhary 

July 1997 Proceedings of the 11th international conference on Supercomputing ICS 

Publisher: ACM Press 
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The major source of parallelism in ordinary programs is do loops. When loop iterations of 
parallelized loops are executed on multiprocessors, the cross-iteration data dependencies 
need to be enforced by synchronization between processors. Existing data synchronization 
schemes are either too simple to handle general nested loop structures with non-trivia 
array subscript functions or inefficient due to the large run-time overhead. In this paper, 
we propose a new synchronization sch ... 
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'95, Volume 30 Issue 8 
Publisher: ACM Press 
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Irregular loop nests in which the loop bounds are determined dynamically by indexed 
arrays are difficult to compile into expressive parallel constructs, such as segmented 
scans and reductions. In this paper, we describe a suite of transformations to 
automatically parallelize such irregular loop nests, even in the presence of recurrences. 
We describe a simple, general loop flattening transformation, along with new 
optimizations which make it a viable compiler transformation. A robust recurre ... 
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November 2000 Proceedings of the 2000 ACM/IEEE conference on Supercom puting 
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Tiling is one of the more important transformations for enhancing loca lity of reference in 
programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of 
these loops. Tiling of perfectly-nested loop nests (which are loop nests in which all 
assignment statements are contained in the innermost loop) is well understood. In 
practice, many loop nests are imperfectly nested, so existing compilers use heuristics to 
try to find a sequence of transformations that con ... 

7 A compiler optimization to reduce execut ion time of loop nest 
j|v Oh-Young Kwon, Gi-Ho Park, Tack-Don Han 

v 7 March 1996 ACM SIGARCH Computer Architecture News, volume 24 issue 1 
Publisher: ACM Press 

Full text available: ^ pdf(333.41 KB) Additional Information: full citation , abstract , index terms 

In this paper, a compiler optimization to reduce the execution time of loop nest is 
proposed. Loop tiling is used to optimize loop nest. Loop tiling is the well-known 
optimization for improving locality. However, it has a count result that increases the 
number of instructions to loop control. These increased instructions disturb the effect of 
locality optimization. Therefore, the tiling of innermost loop is not perform in order to 
reduce the instructions for loop control in this paper. This opti ... 
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In this paper, we address the problems of automatically selecting unroll factors for 
perfectly nested loops, and generating compact code for the selected unroll factors. 
Compared to past work, the contributions of our work include a) a more detailed cost 
model that includes ILP and 1-cache considerations, b) a new code generation algorithm 
for unrolling nested loops that generates more compact code (with fewer remainder 
loops) than the unroll-and-jam transf ... 
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We present an approach for synthesizing transformations to enhance locality in 
imperfectly-nested loops. The key idea is to embed the iteration space of every statement 
in a loop nest into a special iteration space called the product space. The product space 
can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like 
code sinking and loop fusion that are used in ad hoc ways in current compilers to produce 
perfectly-nested loops from imperfectly-n ... 

10 Parallel and distributed systems (PDS): A^utomatjc parallel code^geneiatjon fprjiied 

<g> nested loops 

Georgios Goumas, Nikolaos Drosinos, Maria Athanasaki, Nectarios Koziris 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing SAC '04 

Publisher: ACM Press 

Full text available: *g pdf(367.Q2 KB) Additional Information: full citation , abstract , references , citing s 

This paper presents an overview of our work, concerning a complete end-to-end 
framework for automatically generating message passing parallel code for tiled nested 
for-loops. It considers general parallelepiped tiling transformations and general convex 
iteration spaces. We address all problems regarding both the generation of sequential 
tiled code and its parallelization. We have implemented our techniques in a tool which 
automatically generates MPI parallel code and conducted several series of ... 

Keywords: MPI, automatic SPMD code generation, nested loops, parallelizing compilers, 
supernodes, tiling 
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Loop fusion is commonly used to improve the instruction-level parallelism of loops for 
high-performance embedded computing systems. Loop fusion, however, is not always 
directly applicable because the fusion prevention dependencies may exist among loops. 
Most of the existing techniques still have limitations in fully exploiting the advantages of 
loop fusion. In this paper, we present a general loop fusion technique for loops or nested 
loops based on the loop dependency graph model, retiming, and ... 

Keywords: code size, embedded DSP, loop fusion, retiming, scheduling 



12 Com pilin g nested data-parallel pro gr ams for shared-memory multiprocessors 
H> Siddhartha Chatterjee 

July 1993 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 15 Issue 3 
Publisher: ACM Press 

Full text available* 'PI df(4 17 MB) Additional Information: fuN citation, references, citings, index terms, 
' t£J - - - review 



http://porta].acm.org/results.cfm?coll=ACM&dl=ACM&CFlD=8498393& 



12/3/2007 



Results (page 1): +nest* -Hoop +compil* 

Keywords: compilers, data parallelism, shared-memory multiprocessors 



Page 4 of 6 



13 Compilers and Optimization: Combi ned partitionin g and data p addin g for schedu ling 
^ mult i ple loop nests 

^ Zhong Wang, Edwin H.-M. Sha, Xiaobo (Sharon) Hu 

November 2001 Proceedings of the 2001 international conference on Compilers, 

architecture, and synthesis for embedded systems CASES '01 
Publisher: ACM Press 
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With the widening performance gap between processors and main memory, efficient 
memory accessing behavior is necessary for good program performance. Loop partition is 
an effective way to exploit the data locality. Traditional loop partition techniques, 
however, consider only a singleton nested loop. This paper presents multiple loop 
partition scheduling technique, which combines the loop partition and data padding to 
generate the detailed partition schedule. The computation and data prefetching ... 
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D. Neel, M. Amirchahy 

January 1975 ACM SIGPLAN Notices , Proceedings of the conference on Programming 
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This document presents how one of the most important optimizations that a program may 
undergo is dealt with by means of attributes [7], A semantic formalization of the classical 
method which consists of removing all loop-independent statements from the articulation 
blocks of a loop is given. The method is equally well applicable to algebraic languages or 
their intermediate code : in a high level language even very well constructed programs 
quite often contain in their intermediate code vers ... 

Keywords: Articulation block, Basic block, Compiler optimization, Evaluation program, 
Invariant statement, Semantic attributes 
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This article analyzes and quantifies the locality characteristics of numerical loop nests in 
order to suggest future directions for architecture and software cache optimizations. Since 
most programs spend the majority of their time in nests, the vast majority of cache 
optimization techniques target loop nests. In contrast, the locality characteristics that 
drive these optimizations are usually collected across the entire application rather than at 
the nest level. Researchers have studied nu ... 

16 Compilers, supercomputing and quantum computin g : A general a p proach for 
^ partitioning N-dimensional parallel nested loops with conditionals 

Arun Kejariwal, Alexandru Nicolau, Hideki Saito, Xinmin Tian, Milind Girkar, Utpal Banerjee, 
Constantine D. Polychronopoulos 

July 2006 Proceedings of the eighteenth annual ACM symposium on Parallelism in 
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Parallel loops account for the greatest amount of parallelism in scientific and numerical 
codes. For example, most of the DO loops in SPEC CFP2000 and SPEC OMPM2001 are of 
DOALL type and account for a large percentage of the total execution time. One of the 
ways to exploit parallelism is to partition the iteration space of a DOALL loop amongst 
different processors in a parallel processor system. Naturally, a good partitioning is of key 
importance to achieve high performance ... 

Keywords: Fourier-Motzkin elimination, affine, conditionals, parallel loops, partitioning 
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One of the most efficient ways to improve program performances onto nowadays 
computers is to optimize the way cache memories are used. In particular, many scientific 
applications contain loop nests that operate on large multi-dimensional arrays whose sizes 
are often parameterized. No special attention is paid to cache memory performance when 
such loops are written. In this work, we focus on spatial locality optimization such that all 
the data that are loaded as a block in the cache will be used ... 

Keywords: Ehrhart polynomials, cache memory, loop nests, optimizing compiler, 
parameterized polyhedron, program performance optimization, spatial locality 
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This paper presents a novel source code transformation for control flow optimization 
called loop nest splitting which minimizes the number of executed if-statements in loop 
nests of embedded multimedia applications. The goal of the optimization is to reduce 
runtimes and energy consumption. The analysis techniques are based on precise 
mathematical models combined with genetic algorithms. Due to the inherent portability of 
source code transformations, a very detailed benchmarking using 10 differen ... 
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This paper analyzes and quantifies the locality characteristics of numerical loop nests in 
order to suggest future directions for architecture and software cache optimizations. Since 
most programs spend the majority of their time in nests, the vast majority of cache 
optimization techniques target loop nests. In contrast, the locality characteristics that 
drive these optimizations are usually collected across the entire application rather than 
the nest level. Indeed, researchers have studied nume ... 
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1 A study of source-level compiler algorithms for automatic construction of .pre-. 
#k execution code 

Dongkeun Kim, Donald Yeung 

August 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 3 
Publisher: ACM Press 

Additional Information: full citation , abstra ct, references, citings, index 
terms 

Pre-execution is a promising latency tolerance technique that uses one or more helper 
threads running in spare hardware contexts ahead of the main computation to trigger 
long-latency memory operations early, hence absorbing their latency on behalf of the 
main computation. This article investigates several source-to-source C compilers for 
extracting pre-execution thread code automatically, thus relieving the programmer or 
hardware from this onerous task. We present an aggressive profile-driven co ... 

Keywords: Data prefetching, memory-level parallelism, multithreading, pre-execution, 
prefetch conversion, program slicing, speculative loop parallelization 
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To meet the demands of modern architectures, optimizing compilers must incorporate an 
ever larger number of increasingly complex transformation algorithms. Since code 
transformations may often degrade performance or interfere with subsequent 
transformations, compilers employ predictive heuristics to guide optimizations by 
predicting their effects a priori. Unfortunately, the unpredictability of optimization 
interaction and the irregularity of today's wide-issue machines severely limit the accura .. 
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While there have been many recent proposals for hardware that supports Thread-Level 
Speculation (TLS), there has been relatively little work on compiler optimizations to fully 
exploit this potential for parallelizing programs optimistically. In this paper, we focus on 
one important limitation of program performance under TLS, which is stalls due to 
forwarding scalar values between threads that would otherwise cause frequent data 
dependences. We present and evaluate dataflow algorithms for ... 

Com piling real-time programs into schedulable code j|| 
Seongsoo Hong, Richard Gerber 
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We present a programming language with first-class timing constructs, whose semantics 
is based on time-constrained relationships between observable events. Since a system 
specification postulates timing relationships between events, realizing the specification in 
a program becomes a more straightforward process. Using these constraints, as well as 
those imposed by data and control flow properties, our objective is to transform the code 
so that its worst-case execution time is con ... 

Code mana g ement: Hot pathVM: an effective JIT compiler for resource-constrained jjg 
devices 

Andreas Gal, Christian W. Probst, Michael Franz 
June 2006 Proceedings of the 2nd international conference on Virtual execution 

environments VEE '06 
Publisher: ACM Press 

Full text available: c j|?[ pdf(221 .33 KB) Additional Information: fuH citation, abstract, references, index, terjns 

We present a just-in-time compiler for a Java VM that is small enough to fit on resource- 
constrained devices, yet is surprisingly effective. Our system dynamically identifies traces 
of frequently executed bytecode instructions (which may span several basic blocks across 
several methods) and compiles them via Static Single Assignment (SSA) construction. Our 
novel use of SSA form in this context allows to hoist instructions across trace side-exits 
without necessitating expensive compensation code ... 

Keywords: dynamic compilation, embedded and resource-constrained systems, mixed- 
mode interpretive compiled systems, software trace scheduling, static single assignment 
form, virtual machines 
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Embedded systems are often used in safety-critical environments. Thus, thorough testing 
of them is mandatory. A quite active research area is the automatic test-case generation 
for testing embedded systems. To achieve high retargetability of the testing framework, 
thetest-case generation has to be done at source-code level. However, it is challenging to 
guarantee that the test-cases obtained from the source code are also valid at the object- 
code level, since even in safety-critical domains pr ... 

Keywords: code transformation, compiler, decision coverage, optimization, structural 
code-coverage preservation, testing 
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With advances in VLSI technology, microprocessor designers can provide more 
microarchitectural parallelism to increase performance. We have identified four major 
forms of such parallelism: multiple microoperations issued per cycle, multiple result 
distribution buses, multiple execution units, and pipelined execution units. The 
experiments reported in this paper address two important issues: The effects of these 
forms and the appropriate balance among them. A central microar ... 
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languages and operating systems ASPLOS-X, Volume 37 , 36 , 30 issue 10 , 5 , 5 

Publisher: ACM 

Full text available: < g| pdf( t.43 MB ) Additional Information: full citati on, abstract, references , cited b y 

Pre-execution is a promising latency tolerance technique that uses one or more helper 
threads running in spare hardware contexts ahead of the main computation to trigger 
long-latency memory operations early, hence absorbing their latency on behalf of the 
main computation. This paper investigates a source-to-source C compiler for extracting 
pre-execution thread code automatically, thus relieving the programmer or hardware from 
this onerous task. At the heart of our compiler are three algorithms. ... 

C an d tec: a langua ge and compiler for dynamic code g eneration 

Massimiliano Poletto, Wilson C. Hsieh, Dawson R. Engler, M. Frans Kaashoek 

March 1999 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 21 Issue 2 
Publisher: ACM Press 

Full text available 1 fU| pdf(471 68 KB) Additional Information: fuljjcitation, abstract, references, citings, index 
• [A] u terms, review 

Dynamic code generation allows programmers to use run-time information in order to 
achieve performance and expressiveness superior to those of static code. The 'C(Tick C) 
language is a superset of ANSI C that supports efficient and high-level use of dynamic 
code generation. 'C provides dynamic code generation at the level of C expressions and 
statements and supports the composition of dynamic code at run time. These features 
enable programmers to add dynamic code generation ... 

Keywords: ANSI C, compilers, dynamic code generation, dynamic code optimization 



10 Code compression: Compiler optimization and orderin g effects on VLIW code |g 
compression 

Montserrat Ros, Peter Sutton 

October 2003 Proceedings of the 2003 international conference on Compilers, 

architecture and synthesis for embedded systems CASES '03 
Publisher: ACM Press 
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terms 

Code size has always been an important issue for all embedded applications as well as 
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larger systems. Code compression techniques have been devised as a way of battling 
bloated code; however, the impact of VLIW compiler methods and outputs on these 
compression schemes has not been thoroughly investigated. This paper describes the 
application of single- and multiple-instruction dictionary methods for code compression to 
decrease overall code size for the TI TMS320C6xxx DSP family. The compression ... 

Keywords: VLIW, code compression, compiler optimizations 



1 1 Compilation : Effic ient spill code for SDRAM 
V. Krishna Nandivada, Jens Palsberg 

October 2003 Proceedings of the 2003 international conference on Compilers, 

architecture and synthesis for embedded systems CASES '03 
Publisher: ACM Press 

Full text available" "pi pdf (199 32 KB) Ac,ditiona, Information: full citation , abstract , references , citings, index 

' terms, review 

Processors such as StrongARM and memory such as SDRAM enable efficient execution of 
multiple loads and stores in a single instruction. This is particularly useful in connection 
with register allocation where spill code may need to save and restore multiple registers. 
Until now, there has been no effective strategy for utilizing this to its full potential. In this 
paper we investigate the use of SDRAM for optimization of spill code. The core of the 
problem is to arrange the variables in the spill ... 

Keywords: SDRAM, integer linear programming, memory layout, optimization 



1 2 Speculati ve disam biguation: a compilation technique for dynamic memory 
disambi guation 

A. S. Huang, G. Slavenburg, J. P. Shen 

April 1994 ACM SIGARCH Computer Archi tecture News , Proceedings of the 21st 

annual international symposium on Computer architecture ISCA '94, volume 
22 Issue 2 

Publisher: IEEE Computer Society Press, ACM 

Full text available" E P) df(1 09 MB) Additional Information: full citation, abstract, re ference s, cited by, index 
u ex aval a e.^.p„u terms 

Ambiguous memory references have always been one of the main sources of performance 
bottlenecks. Many papers have addressed this problem using static disambiguation. These 
methods work extremely well when the memory access pattern is linear and predictable. 
However they are ineffective when the memory access pattern is nonlinear or when the 
access pattern cannot be determined statically. For these difficult problems, this paper 
presents speculative disambiguation, a compilation technique for arc ... 

13 Avoidance and su p pression of compensation code in a trace schedulin g compiler 
Stefan M. Freudenberger, Thomas R. Gross, P. Geoffrey Lowney 

July 1994 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 16 Issue 4 
Publisher: ACM Press 

en * * i ui is»i ^t-oMm Additional Information: ful l citati on, a bstract , references, citings, index 

Full text available: t%| pdf(3. 58 MB ) : — 

' terms, Leyiew 

Trace scheduling is an optimization technique that selects a sequence of basic blocks as a 
trace and schedules the operations from the trace together. If an operation is moved 
across basic block boundaries, one or more compensation copies may be required in the 
off-trace code. This article discusses the generation of compensation code in a trace 
scheduling compiler and presents techniques for limiting the amount of compensation 
code: avoidance (restricting code motion so that no compensatio ... 

Keywords: SPEC89, instruction-level parallelism, performance evaluation, trace 
scheduling 
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1 4 Shangri-La: achieving high performance from compiled network applications while 
<g> enaMng_ease_£^^ 

^ Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju 

June 2005 ACM SIGPLAN Notices , Proceedings of the 2005 ACM SIGPLAN conference 
on Programming language design and implementation PLDI '05, Volume 40 
Issue 6 
Publisher: ACM Press 

Full text available- fT) pdf(480 93 KB) Additional Information: full citation , abstra ct, references, citings, index 
• TAj-E—L — : terms 

Programming network processors is challenging. To sustain high line rates, network 
processors have extremely tight memory access and instruction budgets. Achieving 
desired performance has traditionally required hand-coded assembly. Researchers have 
recently proposed high-level programming languages for packet processing, but the 
challenges of compiling these languages into code that is competitive with hand-tuned 
assembly remain unanswered. This paper describes the Shangri-La compiler, which 
acce ... 



Keywords: chip multiprocessors, dataflow programming, network processors, packet 
processing, program partitioning, throughput-oriented computing 
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com pilation via fast code analysis a nd _bytecqde tracing. 
G. Agosta, S. Crespi Reghizzi, P. Palumbo, M. Sykora 

April 2006 Proceedings of the 2006 ACM symposium on Applied computing SAC '06 
Publisher: ACM Press 

Full text available: 1 ^ pdf(265.02 KB) Additional Information: fyH citation, abstract, references, index Jenns 

Modern Java Virtual Machines (JVM) commonly adopt Just-In-Time (JIT) compilation to 
speed up the execution of Java Bytecode. However, the effort of compiling a region of 
code is only worth if the code is frequently executed. Therefore, Selective Compilation is 
employed so that the JIT compiler is only invoked on those regions of code where most of 
the computation is performed (hot spots). The core task in Selective Compilation is to 
correctly identify the hot spots in a program. In our Se ... 



16 Compilation/code g eneration: A simplified j ava bytecode compilation system for 
resource-constrained embedded p rocessors 
Carmen Badea, Alexandru Nicolau, Alexander V. Veidenbaum 

September 2007 Proceedings of the 2007 international conference on Compilers, 

architecture, and synthesis for embedded systems CASES '07 
Publisher: ACM Press 

Full text available: C Q pdf(439.70 KB) Additional Information: full citati on, abstract, references, index terms 

Embedded platforms are resource-constrained systems in whichperformance and memory 
requirements of executed code are ofcritical importance. However, standard techniques 
such as full just-in-time(JIT) compilation and/or adaptive optimization (AO) may not be 
appropriate for this type of systems due to memory and compilation overheads. 

The research presented in this paper proposes a technique that combines some of the 
main benefits of JIT compilation, superoperators(SOs) and profile-guide ... 

Keywords: adaptive optimization, embedded systems, java virtual machine, profile- 
guided optimization, superoperators 



17 For mal certification of a compiler back-end or: pro g rammin g a com pi ler with a proof jjggj 

^ assistant 
Xavier Leroy 

http://portal.acm.org/resu^ 12/3/2007 



Results (page 1): +critical +code +compil* Page 6 of 7 

January 2006 ACM SIGPLAN Notices , Conference record of the 33rd ACM SIGPLAN- 

SIGACT symposium on Principles of programming languages POPL '06, 

Volume 41 Issue 1 
Publisher: ACM Press 

Full text available' "Pi pdf (187 24 KB) Additional Information: full citation , abstract , references , citings, index 

terms 

This paper reports on the development and formal certification (proof of semantic 
preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC 
assembly code, using the Coq proof assistant both for programming the compiler and for 
proving its correctness. Such a certified compiler is useful in the context of formal 
methods applied to the certification of critical software: the certification of the compiler 
guarantees that the safety properties proved on the source code hold f ... 

Keywords: certified compilation, compiler transformations and optimizations, program 
proof, semantic preservation, the Coq theorem prover 



18 Virtual machines and com pilation: Im plementin g f ast JVM inte rp reter s using Java 
<§» itself 

^ Michael Bebenita, Andreas Gal, Michael Franz 

September 2007 Proceedings of the 5th international symposium on Principles and 

practice of programming in Java PPPJ '07 
Publisher: ACM Press 

Full text available: Q pdf(741 .09 KB) Additional Information: full citation , abstract, references , index terms 

Most Java Virtual Machines (JVMs) are themselves written in unsafe languages, making it 
unduly difficult to build trustworthy and safe JVM platforms. While some progress has 
been made on removing compilers from the trusted computing base (using certifying 
compilation), JVM interpreters continue to be built almost exclusively in C/C+ + . We have 
implemented an alternative approach, in which the JVM interpreter itself is built in Java, 
and runs atop a host JVM execution environment. ... 

Keywords: Java virtual machine, interpreter design, metacircular/self-interpreters, 
minimal trusted computing base 
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Michael Schlansker, Vinod Kathail 

December 1995 Proceedings of the 28th annual international symposium on 
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Publisher: IEEE Computer Society Press 
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20 Buildin g Intrusion-Tolerant Secure Softwar e 
Tao Zhang, Xiaotong Zhuang, Santosh Pande 

March 2005 Proceedings of the international symposium on Code generation and 
optimization CGO '05 

Publisher: IEEE Computer Society 

Full text available: ffi pdf(234.98 KB) Additional Information: full citation , abstract , index terms 

In this work, we develop a secret sharing based compiler solution to achieve 
confidentiality, integrity and availability (intrusion tolerance) of critical data together, 
rather than tackling them one by one as in previous approaches. Under our scheme, some 
critical data values are automatically identified by the compiler, whereas some others are 
specified by the user. The compiler generates code for scattering/assembling and 
verifying of those critical data values using secret sharing scheme. In ... 
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1 S pecification and architecture challen g es in hi g h-level synthesis: A code refinement 
^ methodolo g y for performance-improved synthesis from C 
^ Greg Stitt, Frank Vahid, Walid Najjar 

November 2006 Proceedings of the 2006 IEEE/ACM international conference on 

Computer-aided design ICCAD '06 
Publisher: ACM Press 

Full text available: ^E[ pdf(219.07 KB) Additional Information: full citation, abstract , references , index terms 

Although many recent advances have been made in hardware synthesis techniques from 
software programming languages such as C, the performance of synthesized hardware 
commonly suffers due to the use of C constructs and coding practices that are not 
appropriate for hardware. Most previous approaches to addressing this problem require 
drastic changes to coding practice. We present an approach that instead requires only 
minimal changes but yields significant speedups. In this approach, a software ... 

Keywords: FPGA, code refinement, coding guidelines, compilation, embedded systems, 
hardware/software partitioning, synthesis 



2 Inli ne function expansion for com piling C prog rams 
•% P. P. Chang, W.-W. Hwu 

June 1989 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1989 Conference 
on Programming language design and implementation PLDI '89, volume 24 
Issue 7 
Publisher: ACM Press 

Additional Information: full citation , abstract, references , citings, index 
terms 

Inline function expansion replaces a function call with the function body. With automatic 
inline function expansion, programs can be constructed with many small functions to 
handle complexity and then rely on the compilation to eliminate most of the function calls. 
Therefore, inline expansion serves a tool for satisfying two conflicting goals: minizing the 
complexity of the program development and minimizing the function call overhead of 
program execution. A simple inline expansion procedur ... 



Full text available: 



Function unit s pecialization through code analysis 
Daniel Benyamin, William H. Mangione-Smith 

November 1999 Proceedings of the 1999 IEEE/ACM international conference on 

Computer-aided design ICCAD '99 
Publisher: IEEE Press 

Full text available: Qp^(1Q5l29 KB) Additional Information: full citation, abstract, references, index .terms 
Many previous attempts at ASIP synthesis have employed template matching techniques 
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to target function units to application code, or directly design new units to extract 
maximum performance. This paper presents an entirely new approach to specializing 
hardware for application specific needs. In our framework of a parameterized VLIW 
processor, we use a post-modulo scheduling analysis to reduce the allocated hardware 
resources while increasing the code's performance. Initial results i ... . 

4 INSIDE: INstruction Selection/Identification & Desi g n Exploration for Extensible 
Processors 

Newton Cheung, Sri Parameswaran, J org Henkel 

November 2003 Proceedings of the 2003 IEEE/ACM international conference on 
Computer-aided design ICCAD '03 

Publisher: IEEE Computer Society 

Full text available: ^[pdf(248.23 KB) Additional Information: full citation, abstract, c iting s, index terms 

This paper presents the INSIDE system that rapidly searchesthe design space for 
extensible processors, given area and performance constraints of an embedded 
application, while minimizing the design turn-around-time. Our system consists ofa) a 
methodology to determine which code segments are mostsuited for implementation as a 
set of extensible instructions, b) a heuristic algorithm to select pre-configured 
extensibleprocessors as well as extensible instructions (library), and c)an estimation 
tool ... 

5 EPIC compilation: Inlining of mathematical functions in HP-UX for Itanium© 2 
James W. Thomas 

March 2003 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization CGO '03 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(783.82 KB) Additional Information: full citation , abstract , references, index terms 

HP-UX compilers inline mathematical functions for Itanium Processor Family (IPF) systems 
to improve throughput 4X-8X versus external library calls, achieving speeds comparable 
to highly tuned vector functions, without requiring the user to code for a vector interface 
and without sacrificing accuracy or edge-case behaviors. This paper highlights IPF 
architectural features that support implementation of high-performance, high-quality 
math functions for inlining. It discusses strategies for utilizi ... 

6 Surve y of cod e-siz e redu ct [o n_m eth od s 

^ Arpad Beszedes, Rudolf Ferenc, Tibor Gyimothy, Andre Dolenc, Konsta Karsisto 
September 2003 ACM Computing Surveys (CSUR), volume 35 issue 3 

Publisher: ACM Press 

p ii . , ., ., -sn .r/^oon^ox Additional Information: full citation, a bstra ct, references, citings, index 

Full text available: rii pdt(443.89 KB) -• — 

1 — 1 term s 

Program code compression is an emerging research activity that is having an impact in 
several production areas such as networking and embedded systems. This is because the 
reduced-sized code can have a positive impact on network traffic and embedded system 
costs such as memory requirements and power consumption. Although code-size 
reduction is a relatively new research area, numerous publications already exist on it. The 
methods published usually have different motivations and a variety of appli ... 

Keywords: code compaction, code compression, method assessment, method evaluation 
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The market demand for portable multimediaapplications has exploded in the recent 
years. Unfortunately, for such applications current compilers andsoftware optimization 
methods often require designers todo part of the optimization manually. Specifically, 
thehigh-level arithmetic optimizations and the use of complexinstructions are left to the 
designers' ingenuity. In thispaper, we present a tool flow, SymSoft, that automates 
theoptimization of power-intensive algorithmic constructsusing symbolic a ... 

Compilin g real-time programs into schedulable code 
Seongsoo Hong, Richard Gerber 

June 1993 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1993 conference 
on Programming language design and implementation PLDI '93, Volume 28 
Issue 6 
Publisher: ACM Press 

Full text available- ^ r>df(1 06 MB) Additional Information: full citation, abstract , references , citings, index 
' T2J- L u terms 

We present a programming language with first-class timing constructs, whose semantics 
is based on time-constrained relationships between observable events. Since a system 
specification postulates timing relationships between events, realizing the specification in 
a program becomes a more straightforward process. Using these constraints, as well as 
those imposed by data and control flow properties, our objective is to transform the code 
so that its worst-case execution time is con ... 

9 Develo ping safety critical software for an unmanned aerial vehicle situational 
awarenes s too l 
Ricky E. Sward, Mark Gerken 

November 2006 Proceedings of the 2006 annual ACM SIGAda international conference 
on Ada SIGAda '06 

Publisher: ACM Press 

Full text available: ^ pdf(1.16 MB) Additional Information: full citation, abstract , references , index terms 

In this paper, we describe our application of the SPARK programming language to the 
development of flight control software for an Unmanned Aerial Vehicle (UAV). The SPARK 
language was used during a senior-level software engineering course at the US Air Force 
Academy. This paper uses the year-long project from this course as an example 
application of SPARK. The process we used to build an interface between C+ + and Ada is 
discussed along with our experiences with using SPARK. 

Keywords: SPARK, UAV, formal methods, high integrity, safety critical, unmanned aerial 
vehicle 



10 Code o pt imization - 1: C ompiler optimization-space exploration 

Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David I. August 

March 2003 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization CGO f 03 
Publisher: IEEE Computer Society 

Full text available- *P| pdf(1 19 MB) Additional Information: fulLcitation, abstract, references, citings, ijidex 
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To meet the demands of modern architectures, optimizing compilers must incorporate an 
ever larger number of increasingly complex transformation algorithms. Since code 
transformations may often degrade performance or interfere with subsequent 
transformations, compilers employ predictive heuristics to guide optimizations by 
predicting their effects a priori. Unfortunately, the unpredictability of optimization 
interaction and the irregularity of today's wide-issue machines severely limit the accura 
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October 2002 Proceedings of the 15th international symposium on System Synthesis 

ISSS '02 
Publisher: ACM Press 

Full text available: *Q pdf(492.13 KB ) Additional Information: full citation , abstract , references , index t erms 

Nowadays, new DSP applications are offering combined and flexible multimedia and 
telecom services. VLIW processor architectures, which include dedicated but inflexible 
functional units, are usually tuned to a single specific application. In order to accelerate a 
wide range of applications, we propose a VLIW processor containing a novel run-time 
reconfigurable functional unit (RC-FU). Only a few hundred bits and few cycles are 
necessary to configure a new coarse-grain operation on the RC-FU unit. ... 

Keywords: VLIW processors, architectural synthesis, reconfigurable logic 



12 Static resour c e models for c ode-size efficient embedded processors 
^ Qin Zhao, Bart Mesman, Twan Basten 

>^ May 2003 ACM Transactions on Embedded Computing Systems (TECS), Volume 2 issue 2 
Publisher: ACM Press 

Full text available: ^J3d?(65 1.62 KB) Additional Information: full citation, abstract, references, index terms 

Due to an increasing need for flexibility, embedded systems embody more and more 
programmable processors as their core components. Due to silicon area and power 
considerations, the corresponding instruction sets are often highly encoded to minimize 
code size for given performance requirements. This has hampered the development of 
robust optimizing compilers because the resulting irregular instruction set architectures 
are far from convenient compiler targets. Among other considerations, they int ... 

Keywords: Static resource models, constraint analysis, convex hull, phase coupling, 
scheduling 



13 Functional Equivalence Checking for Verification of Al g ebraic Transformations on 
Array-Intensive Source Code 

K. C. Shashidhar, Maurice Bruynooghe, Francky Catthoor, Gerda Janssens 
March 2005 Proceedings of the conference on Design, Automation and Test in Europe 
- Volume 2 DATE '05 

Publisher: IEEE Computer Society 

Fuli text available: ^pdfd 77.24 KB) Additional Information: full citation , abstract , index terms 

Development of energy and performance-efficient embedded software is increasingly 
relying on application of complex transformations on the critical parts of the source code. 
Designers applying such nontrivial source code transformations are often faced with the 
problem of ensuring functional equivalence of the original and transformed programs. 
Currently they have to rely on incomplete and time-consuming simulation. Formal 
automatic verification of the transformed program against the original is ... 

14 Embedded software automation: from s pecification to binary: Com pl ex librar y 
<g> mapping for embedded softw are using symbolic algebra 

Armita Peymandoust, Giovanni De Micheli, Tajana Simunic 

June 2002 Proceedings of the 39th conference on Design automation DAC f 02 
Publisher: ACM Press 

Full text available: Q pdf(95.53 KB ) Additional Information: full citati on, abstract , references , index te rms 

Embedded software designers often use libraries that have been pre-optimized for a given 
processor to achieve higher code quality. However, using such libraries in legacy code 
optimization is nontrivial and typically requires manual intervention. This paper presents a 
methodology that maps algorithmic constructs of the software specification to a library of 
complex software elements. This library-mapping step is automated by using symbolic 
algebra techniques. We illustrate the advantages of our ... 
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Keywords: automated library mapping, computation intensive software, embedded 
software optimization, polynomial representation, symbolic algebra 
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Manuel Costa, Jon Crowcroft, Miguel Castro, Antony Rowstron, Lidong Zhou, Lintao Zhang, 
^ Paul Barham 

October 2005 ACM SIGOPS Operating Systems Review , Proceedings of the twentieth 
ACM symposium on Operating systems principles SOSP '05, volume 39 issue 
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Publisher: ACM Press 

Full text available- f£| pdf( 329.29 KB) Additional Information: full citati on, a bstract , referen ces, citings, index 
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Worm containment must be automatic because worms can spread too fast for humans to 
respond. Recent work has proposed network-level techniques to automate worm 
containment; these techniques have limitations because there is no information about the 
vulnerabilities exploited by worms at the network level. We propose Vigilante, a new end- 
to-end approach to contain worms automatically that addresses these limitations. 
Vigilante relies on collaborative worm detection at end hosts, but does not requir ... 

Keywords: control flow analysis, data flow analysis, self-certifying alerts, worm 
containment 
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March 2005 Proceedings of the international symposium on Code generation and 

optimization CGO '05 
Publisher: IEEE Computer Society 
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In this work, we develop a secret sharing based compiler solution to achieve 
confidentiality, integrity and availability (intrusion tolerance) of critical data together, 
rather than tackling them one by one as in previous approaches. Under our scheme, some 
critical data values are automatically identified by the compiler, whereas some others are 
specified by the user. The compiler generates code for scattering/assembling and 
verifying of those critical data values using secret sharing scheme. In ... 

17 Experience with access functions in an experimental compiler 
Frederic N. Ris 

September 1984 ACM SIG MICRO Newsletter, Volume 15 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(1.12 MB) Additional Information: full citation , abstract , references 

This paper describes an access function subsystem embedded in portions of an 
experimental microcode compiler which was built and used during 1973—6 using the IBM 
PL/I optimizing compiler under VM/370 and CMS. The use of the access function 
subsystem in this context was itself an experiment, performed by a group for all of whom 
PL/I was a new language and VM/370 a new operating system. The implementation of the 
subsystem was done strictly within the confines of the PL/I language. The basic objec ... 

18 Bina ry synthesis 
Greg Stitt, Frank Vahid 

>^ August 2007 ACM Transactions on Design Automation of Electronic Systems 
(TODAES), Volume 12 Issue 3 
Publisher: ACM Press 

Full text available: pdf(341.48 KB) Additional Information: full citat i on , abstract, references , in dex terms 

Recent high-level synthesis approaches and C-based hardware description languages 
attempt to improve the hardware design process by allowing developers to capture 
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desired hardware functionality in a well-known high-level source language. However, 
these approaches have yet to achieve wide commercial success due in part to the 
difficulty of incorporating such approaches into software tool flows. The requirement of 
using a specific language, compiler, or development environment may cause many so ... 

Keywords: Binary synthesis, FPGA, configurable logic, hardware/software codesign, 
hardware/software partitioning, synthesis from software binaries, warp processors 
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^ Engelen, Kyle Gallivan, Jason Hiser, Jack Davidson, Baosheng Cai, Mark Bailey, Hwashin 
Moon, Kyunghwan Cho, Yunheung Paek 

November 2006 ACM Transactions on Embedded Computing Systems (TECS), volumes 
Issue 4 

Publisher: ACM Press 

Full text available:^ pdf(4.Q1 MB) Additional Information: fulidtation, abstract, references, index, terms 

Software designers face many challenges when developing applications for embedded 
systems. One major challenge is meeting the conflicting constraints of speed, code size, 
and power consumption. Embedded application developers often resort to hand-coded 
assembly language to meet these constraints since traditional optimizing compiler 
technology is usually of little help in addressing this challenge. The results are software 
systems that are not portable, less robust, and more costly to develop an ... 

Keywords: User-directed code improvement, genetic algorithms, interactive compilation, 
phase ordering 



20 A Code Transformation-Based Methodolo g y for Im proving 1-Cache Performance o f 
DSP Applications 

N. Liveris, N. Zervas, D. Soudris, C. Goutis 

March 2002 Proceedings of the conference on Design, automation and test in Europe 
DATE '02 

Publisher: IEEE Computer Society 

Full text available: pdf d 05.95 KB ) Additional Information: full citation, abstra ct, citings 

This paper focuses on I-cache behaviourenhancement through the application of high- 
levelcode transformations. Specifically, a flow for theiterative application of the I-Cache 
performanceoptimizing transformations is proposed. Theprocedure of applying 
transformation is driven by aset of analytical equations, which receive parametersrelated 
to code and I-cache structure and predict thenumber of I-cache misses. Experimental 
results froma real-life demonstration application shows thatorder of magnit ... 
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Application-level networking is a promising software organization for improving 
performance and functionality for important network services. The Xok/ExOS exokernel 
system includes application-level support for standard network services, while at the 
same time allowing application writers to specialize networking services. This paper 
describes how Xok/ExpS's kernel mechanisms and library operating system organization 
achieve this flexibility, and retrospectively shares our experiences an ... 
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In order to accommodate the spectrum of configuration options currently required for 
competitive system infrastructures, many systems leverage heavy usage of C 
preprocessor controlled conditional compilation. In herent costs associated with this heavy 
preprocessor usage include both the impaired readability of the base system, and the 
reduced reusability of the configuration code. 

Our proposed solution, C-CLR, allows developers to sift through views of a system based 
on configuration o ... 

Keywords: aspect-oriented programming, conditional compilation, modularization, 
preprocessor directives, structured programming, system configuration, tools 
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We introduce server operating systems, which are sets of abstractions and runtime 
support for specialized, high-performance server applications. We have designed and are 
implementing a prototype server OS with support for aggressive specialization, direct 
device-to-device access, an event-driven organization, and dynamic compiler-assisted 
ILP. Using this server OS, we have constructed an HTTP server that outperforms servers 
running on a conventional OS by more than an order of magnitude and that ... 
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Refactoring usually involves statically analyzing source code to understand which 
transformations safely preserve execution behavior of the program. However, static 
analysis may not scale well for large programs when analysis results are too general, 
when tools for analyzing the source code are unwieldy, or when the tools simply do not 
exist. In such cases, it can be simpler to analyze the program at runtime to gather 
answers needed for safe code changes. I show how dynamic data can guide refact ... 

Keywords: case study, gcc, meaning-preserving restructuring 
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29 Issue 1 

Publisher: ACM Press 

Full text available:^) pdf d. 08 MB ) Additional Information: full citation, abstract , reference s, citings 

The simulated evaluation of memory management policies relies on reference traces— logs 
of memory operations performed by running processes. No existing approach to reference 
trace collection is applicable to a complete system, including the kernel and all processes. 
Specifically, none gather sufficient information for simulating the virtual memory 
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management, the filesystem cache management, and the scheduling of a 
multiprogrammed, multithreaded workload. Existing trace collectors are al ... 

8 Distributed system V IPC in LOCUS: a desi g n and implementation retrospective 
<fi> B D Fleisch 

^ August 1986 ACM SIGCOMM Computer Comm unication Review , Proceedings of the 

ACM SIGCOMM conference on Communications architectures & protocols 
SIGCOMM '86, Volume 16 Issue 3 
Publisher: ACM Press 

Full text available' fill pdf( 1 30 MB) Additional Information: full citation, abstra ct, re ferences , ciUngs, index 
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This paper describes new interprocess communications facilities that have been added to 
the Locus system [POPEK 81][WALKER 83], The facilities improve Locus's interprocess 
communication repertoire by providing distributed support for three separate subsystems 
from System V UNIX: messages, semaphores, and shared memory. Here we describe 
these subsystems and their integration into in the Locus architecture. 

9 Remov ing implement ation details f ro m C+ ± class declaration s 
^ Mark R. Headington 

V" March 1995 ACM SIGCSE Bulletin , Proceedings of the twenty-sixth SIGCSE technical 
symposium on Computer science education SIGCSE '95, volume 27 issue l 
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Full text available: ^ pdf(460.37 KB) Additional Information: full citation, abstract , references , index terms 

Data abstraction— a concept introduced at varying places in the CS1/CS2/CS7 sequence- 
separates the properties of a data type (its values and operations) from the 
implementation of that type. This separation of specification from implementation is 
achieved by encapsulating the implementation so that users of the type can neither 
access nor be influenced by the implementation details. Ideally, therefore, the 
specification should be implementation-independent. The C++ clas ... 

10 How do you release a product implemented in Ada?: makin g compat ible interf ace 
-; changes with t he BiiN Ada compiler 

David B. Kinder 

July 1989 Proceedings of the sixth Washington Ada symposium on Ada WADAS '89 
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Full text available: pdfd .13 MB) Additional Information: full citation , abstract , references , citings 

This paper describes mechanisms that provide the user of a filing system the dynamic 
facility for defining a scope within which backing out can be done on request. Check points 
(defining the beginning of a new scope) can dynamically be established and procedures 
for 'acceptance' (at the end of the scope) or 'undoing' (within or at the end of the scope) 
can be invoked. These scopes can be nested. It is also shown that these mechanisms can 
be used to provide crash resistance. After a crash the syste ... 

Keywords: audit trial, backing out, consistency, crash resistance, error recovery, fault 
tolerance, filing system, recovery block, recovery cache ( = recursive cache) 
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Full text available' l P| pdf(196 15 KB) Additional Information: full citation , abstra ct, references , citings, index 

: terms 

This article presents the design, implementation, and evaluation of 10 -Lite, a unified I/O 
buffering and caching system for general-purpose operating systems. IO-Lite unifies all 
buffering and caching in the system, to the extent permitted by the hardware. In 
particular, it allows applications, the interprocess communication system, the file system, 
the file cache, and the network subsystem to safely and concurrently share a single 
physical copy of the data. Protection and ... 

Keywords: I/O buffering, caching, networking, zero-copy 
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Full text available: pdf( 260.68 KB) Additional Information: full citation, abstract, references, index Jerms 

Individualized exercises are a promising feature in promoting modern e-learning. The 
focus of this article is on the QuizPACK system, which is able to generate parameterized 
exercises for the C language and automatically evaluate the correctness of student 
answers. We introduce QuizPACK and present the results of its comprehensive classroom 
evaluation during four consecutive semesters. Our studies demonstrate that when 
QuizPACK is used for out-of-class self-assessment, it is an exceptional learn ... 

Keywords: E-learning, assessment, classroom study, code execution, individualized 
exercises, introductory programming, parameterized questions 

16 Computer education II: Computer tutorin g for pro g rammin g education Q 
^ Susan M. Eitelman 

>^ March 2006 Proceedings of the 44th annual Southeast regional conference ACM-SE 
44 

Publisher: ACM Press 

Full text available: ^pdf(15 8.22 KB) Additional Information: fuJLcitaiion, abstract, references, index terms 
http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM& 12/3/2007 



Results (page 1): -fcritical +code +header +file 



Page 5 of 6 



Software is increasingly pervasive in the products we use. Consequently, more 
programmers are needed to develop the software. However, there is also an unmet 
demand on programming instructors. One possible solution to the increased demand is to 
complement human teaching with automated computer tutoring. Several examples of 
such computer tutors for programming already exist, however they have not found 
widespread success. In the operational world, there are several job-aids that support 
programme ... 

Keywords: intelligent tutoring 
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We present HeapSafe, a tool that uses reference counting to dynamically verify the 
soundness of manual memory management of C programs. HeapSafe relies on asimple 
extension to the usual malloc/free memory management API: delayed free scopes during 
which otherwise dangling references can exist. Porting programs for use with HeapSafe 
typically requires little effort (on average 0.6% oflines change), adds an average 11% 
time overhead (84% in the worst case), and increases space usage by an avera ... 

Keywords: C, memory management, reference counting, safety 
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Full text available: ^.pdJ(250J_9 KB) Additional Information: full citation, abstract, citings 

Programming, understanding, and tuning the performance of large multiprocessor 
systems is challenging. Experts have difficulty achieving good utilization for applications 
on large machines. The task of implementing a scalable system such as an operating 
system or database on large machines is even more challenging. And the importance of 
achieving good performance on multiprocessor machines is increasing as the number of 
cores per chip increases and as the size of multiprocessors increases. Cruci ... 

19 How to port Linux when the hardware turns soft 
David Lynch 
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Software designers face many challenges when developing applications for embedded 
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systems. One major challenge is meeting the conflicting constraints of speed, code size, 
and power consumption. Embedded application developers often resort to hand-coded 
assembly language to meet these constraints since traditional optimizing compiler 
technology is usually of little help in addressing this challenge. The results are software 
systems that are not portable, less robust, and more costly to develop an ... 

Keywords: User-directed code improvement, genetic algorithms, interactive compilation, 
phase ordering 
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